{ "cells": [ { "cell_type": "markdown", "id": "d1f56416-391c-4842-9d59-1f88776d30c6", "metadata": { "collapsed": true, "jupyter": { "outputs_hidden": true } }, "source": [ "# Comparison of Preranked Gene Set Enrichment analysis(GSEA), traditional GSEA and fGSEA in Python and R\n", "\n", "Author: Shakira Agata \n", "\n", "This Jupyter notebook describes the steps for the execution and comparison of methods: preranked GSEA, traditional GSEA and fGSEA (Fast Gene Set Enrichment Analysis). My goal was to identify the potential difference in outcomes when using three different methods of GSEA: preranked GSEA from GSEApy, traditional GSEA from GSEApy and fGSEA from Bioconductor(1,2). I did this by comparing the outcomes in identified pathways, calculated Enrichment Scores (ES) and Normalized Enrichment Scores for the three methods. The comparison was made using the data for Ag+ and control at 24 hours from ArrayExpress dataset: E-MEXP-3583. This dataset looked at the effect of Ag+ and AgNPs on the gene expression in a human lung epithelial cell line A549. \n", "\n", "This notebook is subdivided into the following two sections:\n", "\n", "* Section 1: Preranked GSEA\n", " * Section 1.1: System preparation\n", " * Section 1.2: Generation of the RNK file\n", " * Section 1.3: Selection of the geneset\n", " * Section 1.4: Execution of GSEA\n", "* Section 2: Traditional GSEA\n", " * Section 2.1: Generation of expression data frame\n", " * Section 2.2: Generation of the CLS file\n", " * Section 2.3: Selection of the geneset\n", " * Section 2.4: Execution of GSEA\n", "* Section 3: fGSEA\n", " * Section 3.1 Magic R extension and system preparation\n", " * Section 3.2 Execution of fGSEA\n", "* Section 4: Session information for Python and R\n", " * Section 4.1: Python session information\n", " * Section 4.2: R session information" ] }, { "cell_type": "markdown", "id": "4205aa02-0287-4d9a-9d08-f57dc3b114c3", "metadata": {}, "source": [ "## Section 1: Preranked GSEA" ] }, { "cell_type": "markdown", "id": "e6046535-4348-4aea-8277-a4ec3de935fb", "metadata": {}, "source": [ "In this section, the preranked GSEA method will be used which allows users to select the ranking method for the comparison of two group of samples (e.g control and treatment). In this case, the differential gene expression results for the comparison of Ag+ to the control at 24h was used to allow for the calculation of the ranking by the log2FC. This ranking method is used in the formal documentation of GSEA and can be used as option in fGSEA and Clusterprofiler GSEA in Rstudio (1)." ] }, { "cell_type": "markdown", "id": "36049303-1f2e-485d-b064-8399053c25f9", "metadata": {}, "source": [ "### Section 1.1: System preparation" ] }, { "cell_type": "markdown", "id": "f724e382-75b0-4810-8490-ccc80c36da5c", "metadata": {}, "source": [ "**Step 1.** The necessary packages for using this pipeline are first installed." ] }, { "cell_type": "code", "execution_count": 6, "id": "c6a4210f-1bc7-48d5-aadd-4a322f79be52", "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "from gseapy.plot import gseaplot\n", "import gseapy as gp\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "from gseapy import dotplot" ] }, { "cell_type": "code", "execution_count": 7, "id": "5afcdb41-97a8-487a-8122-3fa1349aef8d", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Requirement already satisfied: watermark in c:\\users\\shaki\\anaconda3\\lib\\site-packages (2.4.3)\n", "Requirement already satisfied: ipython>=6.0 in c:\\users\\shaki\\anaconda3\\lib\\site-packages (from watermark) (8.25.0)\n", "Requirement already satisfied: importlib-metadata>=1.4 in c:\\users\\shaki\\anaconda3\\lib\\site-packages (from watermark) (7.0.1)\n", "Requirement already satisfied: setuptools in c:\\users\\shaki\\anaconda3\\lib\\site-packages (from watermark) (69.5.1)\n", "Requirement already satisfied: zipp>=0.5 in c:\\users\\shaki\\anaconda3\\lib\\site-packages (from importlib-metadata>=1.4->watermark) (3.17.0)\n", "Requirement already satisfied: decorator in c:\\users\\shaki\\anaconda3\\lib\\site-packages (from ipython>=6.0->watermark) (5.1.1)\n", "Requirement already satisfied: jedi>=0.16 in c:\\users\\shaki\\anaconda3\\lib\\site-packages (from ipython>=6.0->watermark) (0.18.1)\n", "Requirement already satisfied: matplotlib-inline in c:\\users\\shaki\\anaconda3\\lib\\site-packages (from ipython>=6.0->watermark) (0.1.6)\n", "Requirement already satisfied: prompt-toolkit<3.1.0,>=3.0.41 in c:\\users\\shaki\\anaconda3\\lib\\site-packages (from ipython>=6.0->watermark) (3.0.43)\n", "Requirement already satisfied: pygments>=2.4.0 in c:\\users\\shaki\\anaconda3\\lib\\site-packages (from ipython>=6.0->watermark) (2.15.1)\n", "Requirement already satisfied: stack-data in c:\\users\\shaki\\anaconda3\\lib\\site-packages (from ipython>=6.0->watermark) (0.2.0)\n", "Requirement already satisfied: traitlets>=5.13.0 in c:\\users\\shaki\\anaconda3\\lib\\site-packages (from ipython>=6.0->watermark) (5.14.3)\n", "Requirement already satisfied: colorama in c:\\users\\shaki\\anaconda3\\lib\\site-packages (from ipython>=6.0->watermark) (0.4.6)\n", "Requirement already satisfied: parso<0.9.0,>=0.8.0 in c:\\users\\shaki\\anaconda3\\lib\\site-packages (from jedi>=0.16->ipython>=6.0->watermark) (0.8.3)\n", "Requirement already satisfied: wcwidth in c:\\users\\shaki\\anaconda3\\lib\\site-packages (from prompt-toolkit<3.1.0,>=3.0.41->ipython>=6.0->watermark) (0.2.5)\n", "Requirement already satisfied: executing in c:\\users\\shaki\\anaconda3\\lib\\site-packages (from stack-data->ipython>=6.0->watermark) (0.8.3)\n", "Requirement already satisfied: asttokens in c:\\users\\shaki\\anaconda3\\lib\\site-packages (from stack-data->ipython>=6.0->watermark) (2.0.5)\n", "Requirement already satisfied: pure-eval in c:\\users\\shaki\\anaconda3\\lib\\site-packages (from stack-data->ipython>=6.0->watermark) (0.2.2)\n", "Requirement already satisfied: six in c:\\users\\shaki\\anaconda3\\lib\\site-packages (from asttokens->stack-data->ipython>=6.0->watermark) (1.16.0)\n", "Note: you may need to restart the kernel to use updated packages.\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\n", "[notice] A new release of pip is available: 24.3.1 -> 25.0.1\n", "[notice] To update, run: python.exe -m pip install --upgrade pip\n" ] } ], "source": [ "pip install watermark" ] }, { "cell_type": "markdown", "id": "e1aed12e-3d61-480f-9f55-537c1261e1a5", "metadata": {}, "source": [ "### Section 1.2: Generation of the RNK file" ] }, { "cell_type": "markdown", "id": "c34b56ea-2da2-44cc-86a4-c50c5a1bf6b4", "metadata": {}, "source": [ "**Step 2.** Next, you load the data from E-MEXP3583 into a dataframe." ] }, { "cell_type": "code", "execution_count": 10, "id": "1a2bb5d5-a245-4ddd-92ff-5e4f7ea483ce", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | GeneID | \n", "meanExpr | \n", "log2FC | \n", "log2FC SE | \n", "p-value | \n", "adjpvalue | \n", "
---|---|---|---|---|---|---|
0 | \n", "4490 | \n", "9.925881 | \n", "3.935193 | \n", "0.206737 | \n", "9.842270e-11 | \n", "0.000002 | \n", "
1 | \n", "4495 | \n", "9.561522 | \n", "4.109887 | \n", "0.243653 | \n", "3.924060e-10 | \n", "0.000004 | \n", "
2 | \n", "79974 | \n", "6.381454 | \n", "2.532636 | \n", "0.174021 | \n", "2.094514e-09 | \n", "0.000011 | \n", "
3 | \n", "4489 | \n", "12.843381 | \n", "2.498127 | \n", "0.172052 | \n", "2.150589e-09 | \n", "0.000011 | \n", "
4 | \n", "1638 | \n", "6.429810 | \n", "2.349836 | \n", "0.170834 | \n", "3.954792e-09 | \n", "0.000016 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
20513 | \n", "391257 | \n", "6.462645 | \n", "-0.000055 | \n", "0.146291 | \n", "9.996750e-01 | \n", "0.999870 | \n", "
20514 | \n", "10078 | \n", "8.351476 | \n", "-0.000037 | \n", "0.128166 | \n", "9.997478e-01 | \n", "0.999894 | \n", "
20515 | \n", "84328 | \n", "9.130185 | \n", "-0.000033 | \n", "0.167015 | \n", "9.998260e-01 | \n", "0.999923 | \n", "
20516 | \n", "253018 | \n", "7.162925 | \n", "-0.000015 | \n", "0.199487 | \n", "9.999336e-01 | \n", "0.999982 | \n", "
20517 | \n", "440737 | \n", "10.365751 | \n", "0.000003 | \n", "0.187502 | \n", "9.999848e-01 | \n", "0.999985 | \n", "
20518 rows × 6 columns
\n", "\n", " | GeneID | \n", "Rank | \n", "
---|---|---|
0 | \n", "4490 | \n", "3.935193 | \n", "
1 | \n", "4495 | \n", "4.109887 | \n", "
2 | \n", "79974 | \n", "2.532636 | \n", "
3 | \n", "4489 | \n", "2.498127 | \n", "
4 | \n", "1638 | \n", "2.349836 | \n", "
... | \n", "... | \n", "... | \n", "
20513 | \n", "391257 | \n", "-0.000055 | \n", "
20514 | \n", "10078 | \n", "-0.000037 | \n", "
20515 | \n", "84328 | \n", "-0.000033 | \n", "
20516 | \n", "253018 | \n", "-0.000015 | \n", "
20517 | \n", "440737 | \n", "0.000003 | \n", "
20518 rows × 2 columns
\n", "\n", " | Name | \n", "Term | \n", "ES | \n", "NES | \n", "NOM p-val | \n", "FDR q-val | \n", "FWER p-val | \n", "Tag % | \n", "Gene % | \n", "Lead_genes | \n", "
---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "prerank | \n", "WP_COPPER_HOMEOSTASIS | \n", "0.758562 | \n", "2.676049 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "12/53 | \n", "2.46% | \n", "4495;4490;4501;4489;4494;4496;4493;84560;4500;... | \n", "
1 | \n", "prerank | \n", "WP_ZINC_HOMEOSTASIS | \n", "0.816562 | \n", "2.670822 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "15/37 | \n", "5.27% | \n", "4495;4490;4501;4489;4494;7780;4496;7779;4493;8... | \n", "
2 | \n", "prerank | \n", "WP_GENES_RELATED_TO_PRIMARY_CILIUM_DEVELOPMENT... | \n", "-0.546802 | \n", "-2.576323 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "60/101 | \n", "27.17% | \n", "55764;8100;132320;23322;57539;23059;79659;5508... | \n", "
3 | \n", "prerank | \n", "WP_BARDETBIEDL_SYNDROME | \n", "-0.540727 | \n", "-2.417239 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "42/86 | \n", "20.25% | \n", "55764;132320;2736;5311;5727;57539;23059;79659;... | \n", "
4 | \n", "prerank | \n", "WP_CILIOPATHIES | \n", "-0.470576 | \n", "-2.415103 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "78/181 | \n", "20.38% | \n", "1063;55764;132320;4751;2736;5311;4867;219844;2... | \n", "
\n", " | Control | \n", "Control | \n", "Treatment | \n", "Treatment | \n", "
---|---|---|---|---|
EntrezID | \n", "\n", " | \n", " | \n", " | \n", " |
1 | \n", "7.941181 | \n", "8.006905 | \n", "7.990053 | \n", "8.326014 | \n", "
10 | \n", "4.473060 | \n", "4.561733 | \n", "4.546220 | \n", "4.573649 | \n", "
100 | \n", "10.246258 | \n", "10.481636 | \n", "10.297045 | \n", "10.176861 | \n", "
1000 | \n", "11.149919 | \n", "11.145671 | \n", "11.090569 | \n", "11.116654 | \n", "
10000 | \n", "10.274072 | \n", "10.243285 | \n", "10.226056 | \n", "10.054899 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
9991 | \n", "10.900741 | \n", "10.551433 | \n", "10.444575 | \n", "10.473755 | \n", "
9992 | \n", "5.068302 | \n", "5.348781 | \n", "5.664488 | \n", "5.512333 | \n", "
9993 | \n", "9.888579 | \n", "10.014469 | \n", "9.710480 | \n", "9.798821 | \n", "
9994 | \n", "7.720305 | \n", "7.575626 | \n", "7.660303 | \n", "7.523620 | \n", "
9997 | \n", "8.247325 | \n", "8.484476 | \n", "8.669681 | \n", "8.838813 | \n", "
20518 rows × 4 columns
\n", "